Behind the Scenes

Slides updated from DS002 course

Jason Xu

Welcome…to the world of Macbeth

Me in 6th grade:

The Data

“macbeth” from TidyTuesday!

library(dplyr)
library(tidyverse)

macbeth <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-09-17/macbeth.csv')

head(macbeth)
# A tibble: 6 × 5
  act   scene   character         dialogue                           line_number
  <chr> <chr>   <chr>             <chr>                                    <dbl>
1 Act I Scene I [stage direction] Thunder and lightning. Enter thre…          NA
2 Act I Scene I First Witch       When shall we three meet again               1
3 Act I Scene I First Witch       In thunder, lightning, or in rain?           2
4 Act I Scene I Second Witch      When the hurlyburly's done,                  3
5 Act I Scene I Second Witch      When the battle's lost and won.              4
6 Act I Scene I Third Witch       That will be ere the set of sun.             5

Negative Words

Which lines (or dialogue) contain negative words?
I’ll call it Negative Lines

neg_lines <- macbeth |>
  mutate( blood = str_detect( dialogue, "(?i)blood|(?i)murder|(?i)death|(?i)kill" ) ) |>
  select( dialogue, blood)

head(neg_lines)
# A tibble: 6 × 2
  dialogue                                   blood
  <chr>                                      <lgl>
1 Thunder and lightning. Enter three Witches FALSE
2 When shall we three meet again             FALSE
3 In thunder, lightning, or in rain?         FALSE
4 When the hurlyburly's done,                FALSE
5 When the battle's lost and won.            FALSE
6 That will be ere the set of sun.           FALSE

Negative Lines Per Act

Total number of negative lines per act

neg_lines <- macbeth |>
  mutate( blood = str_detect( dialogue, "(?i)blood|(?i)murder|(?i)death|(?i)kill" ) ) |>
  filter( blood ) |>
  group_by( act ) |>
  summarize( negative_lines_per_act = n() ) 

neg_lines
# A tibble: 5 × 2
  act     negative_lines_per_act
  <chr>                    <int>
1 Act I                       15
2 Act II                      31
3 Act III                     27
4 Act IV                      14
5 Act V                       10

String Length

String length counts the number of letters/characters/spaces in a string.

It is one way to quantify the length of each sentence of the dialogue.

string_length = str_length( macbeth$dialogue )

head(string_length)                        
[1] 42 30 34 27 31 32

Average String Length per Act

average_character <- macbeth |>
  mutate( string_length = str_length( dialogue ) ) |>
  group_by( act ) |>
  summarize( average_character = mean( string_length ) ) 

average_character
# A tibble: 5 × 2
  act     average_character
  <chr>               <dbl>
1 Act I                36.3
2 Act II               33.9
3 Act III              36.1
4 Act IV               34.7
5 Act V                36.4

Is there a relationship?

Hypothesis: Characters in acts that contain more negative lines also tend to speak longer sentences (i.e longer string length).

negative_words <- left_join( neg_lines, average_character )

head(negative_words)
# A tibble: 5 × 3
  act     negative_lines_per_act average_character
  <chr>                    <int>             <dbl>
1 Act I                       15              36.3
2 Act II                      31              33.9
3 Act III                     27              36.1
4 Act IV                      14              34.7
5 Act V                       10              36.4

Is there a relationship?

  • Openning and closing (I and V) acts have the longest dialogues, but low negative lines

  • If any relationship: total negative lines per act seems to be negatively associated to average dialogue length per act -> but comfounding variable: some acts may be longer than others

  • Act II has the most negative lines, but the shortest dialogues in average

Limitations

  • “Negative” words are subjective! and only a very limited set of negative words were looked at here.
  • Length of Acts were not captured: some acts may be much longer -> leading to more negative words.

What about “love”???

Does Macbeth ever verbally expressed “love” in the play??? and to whom?

macbeth |>
  filter( character == "Macbeth" ) |>
  mutate( love_strings = str_extract( dialogue, "(?<=(?i)love ).+") ) |>
  filter( !is.na(love_strings) ) |>
  select( character:love_strings, -line_number )
# A tibble: 4 × 3
  character dialogue                                             love_strings   
  <chr>     <chr>                                                <chr>          
1 Macbeth   Safe toward your love and honour.                    and honour.    
2 Macbeth   Courage to make 's love kno wn?                      kno wn?        
3 Macbeth   Grapples you to the heart and love of us,            of us,         
4 Macbeth   To those that know me. Come, love and health to all; and health to …

Seems like it only occurred 4 times in the whole play and they seem to occur only in generic settings (“love and honour”, “love and health to all”, seems like words said when toasting) — poor lady Macbeth :(

References

Picture taken from: https://medium.com/literary-analyses/lady-macbeth-30e549b7c211 Data taken from TidyTuesday

GitHub Repository: https://github.com/rfordatascience/tidytuesday/tree/main/data
Data originally sourced from: https://shakespeare.mit.edu/